39 research outputs found
A Novel Transformer Network with Shifted Window Cross-Attention for Spatiotemporal Weather Forecasting
Earth Observatory is a growing research area that can capitalize on the
powers of AI for short time forecasting, a Now-casting scenario. In this work,
we tackle the challenge of weather forecasting using a video transformer
network. Vision transformer architectures have been explored in various
applications, with major constraints being the computational complexity of
Attention and the data hungry training. To address these issues, we propose the
use of Video Swin-Transformer, coupled with a dedicated augmentation scheme.
Moreover, we employ gradual spatial reduction on the encoder side and
cross-attention on the decoder. The proposed approach is tested on the
Weather4Cast2021 weather forecasting challenge data, which requires the
prediction of 8 hours ahead future frames (4 per hour) from an hourly weather
product sequence. The dataset was normalized to 0-1 to facilitate using the
evaluation metrics across different datasets. The model results in an MSE score
of 0.4750 when provided with training data, and 0.4420 during transfer learning
without using training data, respectively.Comment: 16 pages, 7 figures, 7 table
Computation of Heterogeneous Object Co-embeddings from Relational Measurements
Dimensionality reduction and data embedding methods generate low dimensional representations of a single type of homogeneous data objects. In this work, we examine the problem of generating co-embeddings or pattern representations from two different types of objects within a joint common space of controlled dimensionality, where the only available information is assumed to be a set of pairwise relations or similarities between instances of the two groups. We propose a new method that models the embedding of each object type symmetrically to the other type, subject to flexible scale constraints and weighting parameters. The embedding generation relies on an efficient optimization dispatched using matrix decomposition, that is also extended to support multidimensional co-embeddings. We also propose a scheme of heuristically reducing the parameters of the model, and a simple way of measuring the conformity between the original object relations and the ones re-estimated from the co-embeddings, in order to achieve model selection by identifying the optimal model parameters with a simple search procedure. The capabilities of the proposed method are demonstrated with multiple synthetic and real-world datasets from the text mining domain. The experimental results and comparative analyses indicate that the proposed algorithm outperforms existing methods for co-embedding generation
Physics-Driven ML-Based Modelling for Correcting Inverse Estimation
When deploying machine learning estimators in science and engineering (SAE)
domains, it is critical to avoid failed estimations that can have disastrous
consequences, e.g., in aero engine design. This work focuses on detecting and
correcting failed state estimations before adopting them in SAE inverse
problems, by utilizing simulations and performance metrics guided by physical
laws. We suggest to flag a machine learning estimation when its physical model
error exceeds a feasible threshold, and propose a novel approach, GEESE, to
correct it through optimization, aiming at delivering both low error and high
efficiency. The key designs of GEESE include (1) a hybrid surrogate error model
to provide fast error estimations to reduce simulation cost and to enable
gradient based backpropagation of error feedback, and (2) two generative models
to approximate the probability distributions of the candidate states for
simulating the exploitation and exploration behaviours. All three models are
constructed as neural networks. GEESE is tested on three real-world SAE inverse
problems and compared to a number of state-of-the-art optimization/search
approaches. Results show that it fails the least number of times in terms of
finding a feasible state correction, and requires physical evaluations less
frequently in general.Comment: 19 pages, the paper is accepted by Neurips 2023 as a spotligh
Brain Tumor Segmentation in Fluid-Attenuated Inversion Recovery Brain MRI using Residual Network Deep Learning Architectures
Early and accurate detection of brain tumors is
very important to save the patient's life. Brain tumors are
generally diagnosed manually by a radiologist by analyzing the
patient’s brain MRI scans which is a time-consuming process.
This led to our study of this research area for finding out a
solution to automate the diagnosis to increase its speed and
accuracy. In this study, we investigate the use of Residual
Network deep learning architecture to diagnose and segment
brain tumors. We proposed a two-step method involving a
tumor detection stage, using ResNet50 architecture, and a
tumor area segmentation stage using ResU-Net architecture. We
adopt transfer learning on pre-trained models to help get the
best performance out of the approach, as well as data
augmentation to lessen the effect of data population imbalance
and hyperparameter optimization to get the best set of training
parameter values. Using a publicly available dataset as a testbed
we show that our approach achieves 84.3% performance
outperforming the state-of-the-art using U-Net by 2% using the
Dice Coefficient metric
Data Augmentation Using Generative Adversarial Networks to Reduce Data Imbalance with Application in Car Damage Detection
Automatic car damage detection and assessment
are very useful in alleviating the burden of manual inspection
associated with car insurance claims. This will help filter out any
frivolous claims that can take up time and money to process.
This problem falls into the image classification category and
there has been significant progress in this field using deep
learning. However, deep learning models require a large
number of images for training and oftentimes this is hampered
because of the lack of datasets of suitable images. This research
investigates data augmentation techniques using Generative
Adversarial Networks to increase the size and improve the class
balance of a dataset used for training deep learning models for
car damage detection and classification. We compare the
performance of such an approach with one that uses a
conventional data augmentation technique and with another
that does not use any data augmentation. Our experiment shows
that this approach has a significant improvement compared to
another that does not use data augmentation and has a slight
improvement compared to one that uses conventional data
augmentation
Semantic Segmentation and Depth Estimation of Urban Road Scene Images Using Multi-Task Networks
In autonomous driving, environment perception
is an important step in understanding the driving scene. Objects
in images captured through a vehicle camera can be detected
and classified using semantic segmentation and depth
estimation methods. Both these tasks are closely related to each
other and this association helps in building a multi-task neural
network where a single network is used to generate both views
from a given monocular image. This approach gives the
flexibility to include multiple related tasks in a single network.
It helps reduce multiple independent networks and improve the
performance of all related tasks. The main aim of our research
presented in this paper is to build a multi-task deep learning
network for simultaneous semantic segmentation and depth
estimation from monocular images. Two decoder-focused U-
Net-based multi-task networks that use a pre-trained Resnet-50 and DenseNet-121 which shared encoder and task-specific
decoder networks with Attention Mechanisms are considered.
We also employed multi-task optimization strategies such as
equal weighting and dynamic weight averaging during the
training of the models. The corresponding models’ performance
is evaluated using mean IoU for semantic segmentation and
Root Mean Square Error for depth estimation. From our
experiments, we found that the performance of these multi-task
networks is on par with the corresponding single-task networks
Application of Convolutional Neural Networks for Automated Ulcer Detection in Wireless Capsule Endoscopy Images.
Detection of abnormalities in wireless capsule endoscopy (WCE) images is a challenging task. Typically, these images suffer from low contrast, complex background, variations in lesion shape and color, which affect the accuracy of their segmentation and subsequent classification. This research proposes an automated system for detection and classification of ulcers in WCE images, based on state-of-the-art deep learning networks. Deep learning techniques, and in particular, convolutional neural networks (CNNs), have recently become popular in the analysis and recognition of medical images. The medical image datasets used in this study were obtained from WCE video frames. In this work, two milestone CNN architectures, namely the AlexNet and the GoogLeNet are extensively evaluated in object classification into ulcer or non-ulcer. Furthermore, we examine and analyze the images identified as containing ulcer objects to evaluate the efficiency of the utilized CNNs. Extensive experiments show that CNNs deliver superior performance, surpassing traditional machine learning methods by large margins, which supports their effectiveness as automated diagnosis tools
Recommended from our members
A CAPTCHA model based on visual psychophysics: Using the brain to distinguish between human users and automated computer bots
Demand for the use of online services such as free emails, social networks, and online polling is increasing at an exponential rate. Due to this, online service providers and retailers feel pressurised to satisfy the multitude of end-user expectations. Meanwhile, automated computer robots (known as “bots”) are targeting online retailers and service providers by acting as human users and providing false information in order to abuse their service provisioning. CAPTCHA is a set of challenge/response protocol, which was introduced to protect online retailers and service providers from misuse and automated computer attacks. Text-based CAPTCHAs are the most popular form, and are used by most online service providers to differentiate between the human users and bots. However, the vast majority of text-based CAPTCHAs have been broken using the Optical Character Recognition (OCR) techniques and thus, reinforces the need for developing a secure and robust CAPTCHA model. Security and usability are the two fundamental issues that pose a trade-off in the design of a CAPTCHA; a hard CAPTCHA model could also be difficult for human users to resolve, which affects its usability, and vice versa. The model developed in this study uses the unsurpassed abilities of the Human Visual System (HVS) to superimpose and integrate complex information presented in individual frames, using the mechanism of trans-saccadic memory. In this context, the model integrates in its design the concept of persistence of vision, which enables humans to see the world in a continuous fashion. Preliminary results from the proposed model based on this technique are encouraging. To ensure the usability of the proposed CAPTCHA model, we set the threshold for the ORO parameter at 40%. This ensured that our CAPTCHA strings would be recognised by human observers at a rate of over 99% (or as close to 100% as is realistic). In turn, when examining the robustness of our VICAP model to computer programme attacks, we can observe that for the traditional case of OCR recognition, based on a single-frame scenario, the Computer Recognition Success Rate (CRSR) was about 0%, while in the case of a multi-frame scenario, the CRSR could increase to up to 50%